GPX - Gardens Point XML IR at INEX 2005

نویسنده

  • Shlomo Geva
چکیده

The INEX 2006 evaluation was based on the Wikipedia collection in XML format. It consisted of several tasks that required different approaches. In this paper we described the approach that we adopted in an attempt to satisfy the requirements of all the tasks, Thorough, Focused, Relevant in Context, and Best in Context. We have used the same underlying system to approach all tasks. The retrieval strategy is based on the construction of a collection sub-tree, consisting of all nodes that contain one or more of the search terms. Nodes containing search terms are then assigned a score using the GPX ranking scheme which incorporates TF-IDF or BM25 variants, but extends them. Scores are propagated upwards in the document XML tree, and finally all XML elements are ranked. We present results that demonstrate that the approach is versatile and produces consistently good performance. We also provide empirical analysis of the GPX ranking scheme and demonstrate its performance against a baseline TF-IDF and a BM25 scoring scheme.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fine Tuning INEX

Since 2002, INEX has been the benchmark for evaluating XML information retrieval (XML-IR) systems. INEX has based much of its evaluation methodology on that of existing workshops, albeit modified for the specific requirements of XML-IR. Due to some of the modifications, the time spent during evaluation phase of INEX takes a lot longer than comparable workshops. Here, we investigate ways to spee...

متن کامل

What XML-IR Users May Want

It is assumed that by focusing on retrieval at a granularity lower than documents that XML-IR systems will better satisfy users’ information need than traditional IR systems. Participates in INEX’s Ad-hoc track develop XMLIR systems based upon this assumption, using an evaluation methodology in the tradition of Cranfield. However, since the inception of INEX, debate has raged on how applicable ...

متن کامل

NLPX at INEX 2005

XML information retrieval (XML-IR) systems aim to provide users with highly exhaustive and highly specific results. To interact with XML-IR systems users must express both their content and structural needs in the form of a structured query. Historically, these structured queries have been formatted using formal languages such as XPath or NEXI. Unfortunately, formal query languages are very com...

متن کامل

The simplest evaluation measures for XML information retrieval that could possibly work

This paper reviews several evaluation measures developed for evaluating XML information retrieval (IR) systems. We argue that these measures, some of which are currently in use by the INitiative for the Evaluation of XML Retrieval (INEX), are complicated, hard to understand, and hard to explain to users of XML IR systems. To show the value of keeping things simple, we report alternative evaluat...

متن کامل

EPRUM Metrics and INEX 2005

Standard Information Retrieval (IR) metrics are not well suited for new paradigms like XML IR in which retrievable information units are document elements. These units are neither predefined nor independent, and the elements returned by IR systems may overlap and contain near misses. Part of the problem stems from the classical hypotheses on the user behaviour that do not take into account the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005